| Aspect | Frequentist | Bayesian |
|---|---|---|
| Definition of Probability | Long-run frequency | Degree of belief |
| Parameters | Fixed, unknown constants | Random variables with distributions |
| Prior Knowledge | Not formally incorporated | Explicitly incorporated via priors |
| Output | p-values, confidence intervals | Posterior distributions, credible intervals |
| Interpretation | Based on hypothetical repetition | Direct probability statements |
9 Week 9: Statistical Foundations and Study Design
Introduction to Statistics for Animal Science
10 Introduction: Why Statistics Matter in Animal Science
Imagine you’re a swine nutritionist testing a new feed additive that claims to improve growth rates. After a 90-day trial, you observe that pigs fed the new additive weigh an average of 5 kg more than the control group. Is this difference real, or just random variation? Should you recommend this expensive additive to producers?
Or perhaps you’re a beef geneticist comparing two breeding programs. Bulls from Program A seem to produce offspring with slightly better marbling scores. But is the difference large enough to justify changing breeding protocols?
These are the types of questions statistics helps us answer. Statistics is fundamentally about making decisions in the presence of uncertainty. In animal science, we deal with biological variation constantly—no two animals are exactly alike, even if they’re raised identically. Statistics gives us a framework to:
- Quantify patterns and relationships in our data
- Distinguish real effects from random noise
- Make inferences about populations based on samples
- Communicate our findings with appropriate levels of confidence
In this course, we’ll build your statistical toolkit step by step, always connecting concepts back to real problems in animal agriculture.
Statistics won’t give you perfect certainty—that’s impossible in biology. Instead, it helps you quantify how confident you should be in your conclusions and communicate that uncertainty honestly.
11 The Two Statistical Philosophies
Before diving into specific methods, it’s important to understand that there are two major frameworks for thinking about probability and inference: Frequentist and Bayesian statistics. While this course focuses on frequentist methods (the dominant approach in agricultural sciences), being aware of both perspectives will make you a more sophisticated consumer of research.
11.1 Frequentist Statistics
The frequentist approach defines probability as long-run frequency. If we say “the probability of getting heads is 0.5,” we mean that if we flipped a coin infinitely many times, about half would be heads.
11.1.1 Key Principles
Parameters are fixed but unknown: The true average weight of pigs on a new diet is a fixed number—we just don’t know it. Our job is to estimate it from data.
Probability describes data, not hypotheses: We calculate “the probability of observing data this extreme if the null hypothesis were true,” NOT “the probability that the hypothesis is true.”
Repetition is key: Frequentist inference imagines repeating the same experiment many times. Confidence intervals and p-values only make sense in this framework of repeated sampling.
11.1.2 Example: Feed Trial
Suppose we test a new feed supplement in pigs. The frequentist asks:
“If this supplement truly had no effect (null hypothesis), what’s the probability we’d observe a difference this large just by chance?”
If that probability is very small (say, p < 0.05), we conclude the data are incompatible with the null hypothesis, and we reject it in favor of the alternative (the supplement does have an effect).
A p-value of 0.03 does NOT mean “there’s a 3% chance the null hypothesis is true.” It means “if the null were true, we’d see data this extreme only 3% of the time by chance alone.”
11.2 Bayesian Statistics
The Bayesian approach defines probability as degree of belief. It explicitly incorporates prior knowledge and updates that knowledge with data.
11.2.1 Key Principles
Parameters have probability distributions: Instead of saying “the true effect is unknown,” Bayesians say “our belief about the effect can be described by a probability distribution.”
Prior + Data = Posterior: Bayesian analysis combines:
- Prior: What we believed before seeing the data
- Likelihood: What the data tell us
- Posterior: Updated beliefs after seeing the data
Direct probability statements about hypotheses: Bayesians can say things like “there’s an 85% probability the effect is positive” or “the treatment effect is between 2 and 8 kg with 95% probability.”
11.2.2 Bayes’ Theorem
The mathematical foundation of Bayesian statistics is Bayes’ Theorem:
\[ P(\theta | \text{data}) = \frac{P(\text{data} | \theta) \times P(\theta)}{P(\text{data})} \]
Where:
- \(P(\theta | \text{data})\) = Posterior: Our updated belief about parameter \(\theta\) after seeing the data
- \(P(\text{data} | \theta)\) = Likelihood: How probable our data are under different values of \(\theta\)
- \(P(\theta)\) = Prior: Our belief about \(\theta\) before seeing the data
- \(P(\text{data})\) = Marginal likelihood: A normalizing constant (probability of data across all possible \(\theta\))
11.2.3 Example: Same Feed Trial, Bayesian Perspective
A Bayesian might start with prior knowledge: “Previous studies suggest feed supplements increase growth by 0-10 kg, with most around 3-5 kg.” After seeing the data, they update this prior to a posterior distribution and can make statements like:
“Based on our data, there’s a 92% probability that the supplement increases weight by at least 2 kg, and a 70% probability the increase is between 4 and 8 kg.”
11.3 Comparing the Approaches
Frequentist methods are:
- More commonly used in animal science journals
- Required by many regulatory bodies (FDA, EPA)
- Computationally simpler for basic analyses
- The foundation for most statistical software defaults
However, Bayesian methods are growing in popularity, especially for complex models. Being fluent in frequentist thinking first makes learning Bayesian approaches easier later.
12 Understanding P-Values
P-values are perhaps the most misunderstood concept in statistics. Let’s build a proper understanding from the ground up.
12.1 Definition and Meaning
A p-value is defined as:
\[ p = P(\text{data as extreme or more extreme} \mid H_0 \text{ is true}) \]
Where:
- \(P(\cdot)\) = Probability
- \(\mid\) = “given that” or “conditional on”
- \(H_0\) = The null hypothesis (typically “no effect” or “no difference”)
In plain English: The p-value is the probability of observing results at least as extreme as what we actually observed, assuming the null hypothesis is true.
12.1.1 Breaking Down the Definition
Let’s unpack each part:
- “Probability of observing results…” – We’re talking about data, not hypotheses
- “…at least as extreme…” – Not just exactly what we saw, but anything further from what we’d expect under the null
- “…assuming the null hypothesis is true” – This is a conditional probability; we’re starting with an assumption
- ❌ The probability that the null hypothesis is true: \(P(H_0 | \text{data})\)
- ❌ The probability that the result occurred by chance
- ❌ The probability of making a wrong decision
- ❌ The size or importance of an effect
- ❌ The probability that replicating the study would give the same result
12.2 Visualizing What P-Values Mean
Let’s simulate a situation to build intuition. Suppose we’re comparing two groups of beef cattle (Control vs Treatment), and in reality, there’s no true difference between them (null hypothesis is true).
Code
# Simulation parameters
n_per_group <- 25
true_mean <- 600 # kg, both groups
true_sd <- 40
# Generate ONE sample where null is true
set.seed(123)
control <- rnorm(n_per_group, mean = true_mean, sd = true_sd)
treatment <- rnorm(n_per_group, mean = true_mean, sd = true_sd)
# Combine into data frame
sample_data <- tibble(
weight = c(control, treatment),
group = rep(c("Control", "Treatment"), each = n_per_group)
)
# Visualize
p1 <- ggplot(sample_data, aes(x = group, y = weight, fill = group)) +
geom_boxplot(alpha = 0.6, outlier.shape = NA) +
geom_jitter(width = 0.15, alpha = 0.6, size = 2) +
scale_fill_manual(values = c("Control" = "#E69F00", "Treatment" = "#56B4E9")) +
labs(
title = "One Sample: Weights Under the Null (No True Difference)",
subtitle = sprintf("Control mean: %.1f kg | Treatment mean: %.1f kg",
mean(control), mean(treatment)),
y = "Final Weight (kg)",
x = "Group"
) +
theme(legend.position = "none") +
ylim(500, 700)
print(p1)Code
# Run t-test
test_result <- t.test(control, treatment)
cat(sprintf("\nObserved difference: %.1f kg\n", mean(treatment) - mean(control)))
Observed difference: 5.4 kg
Code
cat(sprintf("P-value: %.4f\n", test_result$p.value))P-value: 0.6100
Even though the null hypothesis is true (both groups have the same mean), we observe a difference just due to random sampling. The p-value tells us how “surprising” this observed difference would be if the null were true.
12.2.1 The Distribution of P-Values Under the Null
Now, what happens if we repeat this experiment 1,000 times, always with no true difference?
Code
# Simulate 1000 experiments where null is true
n_simulations <- 1000
simulate_study <- function() {
control <- rnorm(n_per_group, mean = true_mean, sd = true_sd)
treatment <- rnorm(n_per_group, mean = true_mean, sd = true_sd)
t.test(control, treatment)$p.value
}
p_values <- replicate(n_simulations, simulate_study())
# Visualize distribution
p2 <- tibble(p_value = p_values) %>%
ggplot(aes(x = p_value)) +
geom_histogram(bins = 20, fill = "steelblue", alpha = 0.7, color = "white") +
geom_vline(xintercept = 0.05, color = "red", linetype = "dashed", linewidth = 1.2) +
annotate("text", x = 0.05, y = 70, label = "α = 0.05",
color = "red", hjust = -0.1, size = 5) +
labs(
title = "Distribution of P-Values When the Null Hypothesis is TRUE",
subtitle = sprintf("%d simulations: each time, both groups truly have mean = %d kg",
n_simulations, true_mean),
x = "P-value",
y = "Count (out of 1,000 studies)"
) +
scale_x_continuous(breaks = seq(0, 1, 0.1)) +
theme_minimal(base_size = 13)
print(p2)Code
# Calculate proportion "significant"
prop_sig <- mean(p_values < 0.05)
cat(sprintf("\nProportion of p-values < 0.05: %.3f (expected: 0.05)\n", prop_sig))
Proportion of p-values < 0.05: 0.062 (expected: 0.05)
Code
cat(sprintf("Out of %d studies where null is TRUE, %d (%.1f%%) would be \"significant\" at p < 0.05\n",
n_simulations, sum(p_values < 0.05), 100 * prop_sig))Out of 1000 studies where null is TRUE, 62 (6.2%) would be "significant" at p < 0.05
When the null hypothesis is true, p-values are uniformly distributed between 0 and 1. This means about 5% of studies will produce p < 0.05 purely by chance—this is the Type I error rate (false positive rate).
If you use α = 0.05 as your threshold, you’re accepting that 5% of the time, you’ll incorrectly reject a true null hypothesis.
12.3 Common P-Value Misconceptions
Let’s address the most common misinterpretations with specific examples from animal science.
12.3.1 Misconception 1: “p = 0.03 means 3% chance null is true”
WRONG. The p-value is \(P(\text{data} \mid H_0)\), not \(P(H_0 \mid \text{data})\).
Example: In a swine growth study, you find p = 0.03 when comparing two diets. This means:
- ✅ Correct: “If both diets were truly identical, we’d see a difference this large in only 3% of similar studies, just by chance.”
- ❌ Incorrect: “There’s a 3% chance the diets are really the same.”
To know \(P(H_0 \mid \text{data})\), you’d need to know the prior probability that \(H_0\) is true—that requires Bayesian analysis.
12.3.2 Misconception 2: “p = 0.06 means no effect”
WRONG. Absence of evidence is not evidence of absence.
Example: You test a new probiotic in beef cattle and get p = 0.06 for weight gain.
- ❌ Incorrect: “The probiotic doesn’t work.”
- ✅ Correct: “Our data don’t provide strong evidence against the null hypothesis. The effect might be real but small, or our sample size might be too small to detect it.”
Consider these two scenarios that both give p = 0.06:
Code
set.seed(456)
# Scenario A: Large sample, small effect
n_large <- 100
effect_small <- 3 # kg difference
cattle_a_control <- rnorm(n_large, mean = 600, sd = 40)
cattle_a_treat <- rnorm(n_large, mean = 600 + effect_small, sd = 40)
# Scenario B: Small sample, large effect
n_small <- 15
effect_large <- 15 # kg difference
cattle_b_control <- rnorm(n_small, mean = 600, sd = 40)
cattle_b_treat <- rnorm(n_small, mean = 600 + effect_large, sd = 40)
# T-tests
p_a <- t.test(cattle_a_treat, cattle_a_control)$p.value
p_b <- t.test(cattle_b_treat, cattle_b_control)$p.value
# Visualize
data_a <- tibble(weight = c(cattle_a_control, cattle_a_treat),
group = rep(c("Control", "Probiotic"), each = n_large),
scenario = "A")
data_b <- tibble(weight = c(cattle_b_control, cattle_b_treat),
group = rep(c("Control", "Probiotic"), each = n_small),
scenario = "B")
plot_a <- ggplot(data_a, aes(x = group, y = weight, fill = group)) +
geom_boxplot(alpha = 0.6) +
geom_jitter(width = 0.1, alpha = 0.4, size = 1.5) +
labs(title = sprintf("Scenario A: Large Sample, Small Effect\nn=%d per group, p=%.3f",
n_large, p_a),
y = "Weight (kg)", x = "") +
theme(legend.position = "none") +
ylim(450, 750)
plot_b <- ggplot(data_b, aes(x = group, y = weight, fill = group)) +
geom_boxplot(alpha = 0.6) +
geom_jitter(width = 0.1, alpha = 0.4, size = 2) +
labs(title = sprintf("Scenario B: Small Sample, Large Effect\nn=%d per group, p=%.3f",
n_small, p_b),
y = "Weight (kg)", x = "") +
theme(legend.position = "none") +
ylim(450, 750)
plot_a + plot_bBoth studies have p ≈ 0.05-0.07, but they tell very different stories! Always report effect sizes and confidence intervals, not just p-values.
12.3.3 Misconception 3: “p < 0.001 means a large/important effect”
WRONG. Statistical significance ≠ practical significance.
Example: In a massive database of 10,000 pigs, you find that pigs born on Mondays weigh 0.3 kg less at market than pigs born on other days (p < 0.001).
- Statistically significant: Yes! With huge sample sizes, even tiny effects become “significant.”
- Practically significant: Probably not. A 0.3 kg difference is unlikely to matter economically.
Code
# Simulation: huge sample, tiny effect
set.seed(789)
n_huge <- 5000
tiny_effect <- 0.3 # kg
monday_pigs <- rnorm(n_huge, mean = 280, sd = 20)
other_pigs <- rnorm(n_huge, mean = 280 + tiny_effect, sd = 20)
test_huge <- t.test(other_pigs, monday_pigs)
cat(sprintf("Sample size: %d per group\n", n_huge))Sample size: 5000 per group
Code
cat(sprintf("Mean difference: %.2f kg\n", mean(other_pigs) - mean(monday_pigs)))Mean difference: -0.17 kg
Code
cat(sprintf("P-value: %.2e (highly significant!)\n", test_huge$p.value))P-value: 6.67e-01 (highly significant!)
Code
cat(sprintf("But effect size: %.2f kg (%.1f%% of mean weight)\n",
tiny_effect, 100 * tiny_effect / 280))But effect size: 0.30 kg (0.1% of mean weight)
- Is it statistically significant? (p-value)
- Is it practically significant? (effect size, confidence intervals, domain knowledge)
A difference can be statistically significant without being biologically or economically meaningful.
12.4 The Arbitrary Nature of p < 0.05
Where did p < 0.05 come from? It was popularized by statistician R.A. Fisher in the 1920s as a convenient convention, not a law of nature. He even cautioned against treating it as a bright-line rule.
12.4.1 The Problem with Bright Lines
Consider three studies comparing the same feed additive:
- Study A: p = 0.049 → “Significant! The additive works!”
- Study B: p = 0.051 → “Not significant. No evidence it works.”
- Study C: p = 0.048 → “Significant! Definitely works!”
Does it really make sense that Study A and C lead to completely different conclusions than Study B, when the p-values are nearly identical?
Code
# Visualize the arbitrary threshold
tibble(
study = c("A", "B", "C"),
p_value = c(0.049, 0.051, 0.048),
significant = p_value < 0.05
) %>%
ggplot(aes(x = study, y = p_value, fill = significant)) +
geom_col(alpha = 0.7) +
geom_hline(yintercept = 0.05, linetype = "dashed", color = "red", linewidth = 1) +
geom_text(aes(label = sprintf("p = %.3f", p_value)), vjust = -0.5, size = 5) +
annotate("text", x = 2, y = 0.05, label = "α = 0.05 threshold",
color = "red", vjust = -0.5, size = 4) +
scale_fill_manual(values = c("TRUE" = "darkgreen", "FALSE" = "gray50"),
labels = c("TRUE" = "Significant", "FALSE" = "Not Significant")) +
labs(
title = "The Arbitrary Nature of p < 0.05",
subtitle = "Should Study B really lead to a completely different conclusion?",
x = "Study",
y = "P-value",
fill = ""
) +
theme_minimal(base_size = 13) +
theme(legend.position = "top")12.4.2 Modern Perspectives
Many scientific fields are moving away from rigid thresholds:
- Report exact p-values (e.g., p = 0.03, not just “p < 0.05”)
- Focus on effect sizes and confidence intervals more than p-values
- Consider p-values as continuous measures of evidence, not binary decisions
- Some journals now ban the term “statistically significant” entirely
We’ll calculate p-values because they’re standard in animal science, but we’ll always interpret them alongside:
- Effect sizes (how big is the difference?)
- Confidence intervals (what’s the range of plausible values?)
- Practical significance (does the effect size matter in the real world?)
13 Study Design: Observational vs Experimental
Not all research is created equal. The design of a study fundamentally determines what conclusions you can draw, particularly about causation.
13.1 The Gold Standard: Causation vs Association
- Association (correlation): Two variables change together, but we don’t know if one causes the other
- Causation: Changing one variable causes changes in the other
The single most important question when reading research: Can this study establish causation, or only association?
13.1.1 Observational Studies
In an observational study, the researcher simply observes and records data without manipulating any variables. You measure what’s already happening naturally.
Examples in animal science:
- Cross-sectional survey: Measure backfat thickness in pigs across different farms at one point in time
- Cohort study: Follow beef cattle over time and record which ones develop health issues
- Case-control study: Compare diet history of cattle with vs without liver abscesses
Strengths:
- Can study things we can’t (or shouldn’t) experimentally manipulate
- Often cheaper and faster than experiments
- Reflects real-world conditions
- Good for exploratory research and hypothesis generation
Limitations:
- Cannot establish causation (only association)
- Confounding variables can bias results (more on this below)
- Difficult to control for all alternative explanations
13.1.1.1 Example: Farm Size and Pig Health
Imagine you survey 100 swine farms and find that larger farms have lower mortality rates.
Can you conclude that increasing farm size causes better health outcomes?
No! Many confounding variables could explain this:
- Larger farms might have better veterinary care
- They might use better biosecurity protocols
- They might have more experienced managers
- They might be in regions with different disease pressures
- Healthier farms might have expanded to become larger (reverse causation!)
Code
# Simulate observational data with confounding
set.seed(321)
n_farms <- 100
# Management quality is a confounder
management_quality <- rnorm(n_farms, mean = 50, sd = 15)
# Better-managed farms tend to be larger (confounding)
farm_size <- 500 + 8 * management_quality + rnorm(n_farms, mean = 0, sd = 200)
# Mortality is affected by management quality, NOT farm size directly
mortality_rate <- 8 - 0.08 * management_quality + rnorm(n_farms, mean = 0, sd = 1.5)
mortality_rate <- pmax(0, mortality_rate) # Can't be negative
farm_data <- tibble(
farm_id = 1:n_farms,
size = farm_size,
mortality = mortality_rate,
management = management_quality
)
# Naive analysis (ignoring confounding)
p3 <- ggplot(farm_data, aes(x = size, y = mortality)) +
geom_point(alpha = 0.6, size = 3, color = "steelblue") +
geom_smooth(method = "lm", se = TRUE, color = "red", linewidth = 1.2) +
labs(
title = "Observational Study: Farm Size vs Mortality Rate",
subtitle = "Appears that larger farms have lower mortality - but is this causal?",
x = "Farm Size (number of sows)",
y = "Mortality Rate (%)"
)
print(p3)Code
cor_size_mort <- cor(farm_data$size, farm_data$mortality)
cat(sprintf("\nCorrelation between farm size and mortality: %.3f\n", cor_size_mort))
Correlation between farm size and mortality: -0.383
Code
cat("But this is driven by a confounder: management quality!\n")But this is driven by a confounder: management quality!
This is association, not causation. To establish that farm size itself affects mortality, you’d need an experimental design.
13.1.2 Experimental Studies
In an experimental study, the researcher actively manipulates one or more variables (the “treatment” or “intervention”) and measures the effect on an outcome.
Key features:
- Researcher controls who receives which treatment
- Ideally uses randomization to assign treatments
- Controls other variables to isolate the effect of the treatment
- Can establish causation (if designed properly)
Examples in animal science:
- Feed trial: Randomly assign piglets to Diet A vs Diet B, measure growth
- Drug efficacy trial: Randomly assign cattle to antibiotic vs placebo, measure recovery
- Breeding experiment: Randomly assign boars to breeding groups, compare offspring traits
13.1.2.1 Example: Does Lysine Supplementation Improve Growth?
Study design: Take 60 pigs, randomly assign 30 to a control diet and 30 to a lysine-supplemented diet. Raise them identically otherwise. Measure final weight.
Code
# Simulate experimental data
set.seed(654)
n_pigs <- 60
# Randomly assign treatment
pig_data <- tibble(
pig_id = 1:n_pigs,
treatment = rep(c("Control", "Lysine"), each = n_pigs/2),
# Lysine truly increases weight by ~8 kg
final_weight = ifelse(treatment == "Control",
rnorm(n_pigs/2, mean = 115, sd = 12),
rnorm(n_pigs/2, mean = 115 + 8, sd = 12))
)
# Visualize
p4 <- ggplot(pig_data, aes(x = treatment, y = final_weight, fill = treatment)) +
geom_boxplot(alpha = 0.6, outlier.shape = NA) +
geom_jitter(width = 0.15, alpha = 0.5, size = 2.5) +
stat_summary(fun = mean, geom = "point", shape = 23, size = 4, fill = "red") +
scale_fill_manual(values = c("Control" = "#E69F00", "Lysine" = "#009E73")) +
labs(
title = "Experimental Study: Effect of Lysine Supplementation on Pig Growth",
subtitle = "Random assignment allows causal inference",
y = "Final Weight (kg)",
x = "Treatment Group"
) +
theme(legend.position = "none")
print(p4)Code
# Test for difference
exp_test <- t.test(final_weight ~ treatment, data = pig_data)
cat(sprintf("\nMean difference: %.2f kg\n",
mean(pig_data$final_weight[pig_data$treatment == "Lysine"]) -
mean(pig_data$final_weight[pig_data$treatment == "Control"])))
Mean difference: 6.86 kg
Code
cat(sprintf("P-value: %.4f\n", exp_test$p.value))P-value: 0.0044
Code
cat("\nBecause we RANDOMLY assigned treatments, we can conclude:\n")
Because we RANDOMLY assigned treatments, we can conclude:
Code
cat("Lysine supplementation CAUSES increased growth in pigs.\n")Lysine supplementation CAUSES increased growth in pigs.
Why can we claim causation here?
Because of randomization (discussed in detail in the next section). Random assignment ensures that the two groups are equivalent on average at the start—any difference at the end must be due to the treatment.
13.2 Confounding Variables
A confounding variable (or confounder) is a variable that:
- Is associated with the treatment/exposure
- Independently affects the outcome
- Is not on the causal pathway between treatment and outcome
Confounding creates spurious associations—relationships that appear causal but aren’t.
13.2.1 The Classic Example: Ice Cream and Drowning
This non-agricultural example illustrates confounding perfectly:
Observation: Ice cream sales are strongly correlated with drowning deaths.
Conclusion: Ice cream causes drowning?! Should we ban ice cream to save lives?
Reality: Both are caused by a confounder: temperature/summer season
- Hot weather → people buy ice cream
- Hot weather → people go swimming → more drownings
Ice cream and drowning are associated but not causally related.
13.2.2 Agricultural Example: Pasture Type and Weight Gain
Scenario: You visit 20 beef farms. Some use Pasture A (fescue), others use Pasture B (mixed grass). You record average daily gain (ADG) for cattle on each farm.
Observation: Cattle on Pasture A have higher ADG.
Can you conclude Pasture A is better?
Probably not! Possible confounders:
- Farm quality: Better-managed farms might choose Pasture A (and also have better nutrition, genetics, health)
- Soil quality: Farms with better soil grow Pasture A, but soil quality also affects other forages
- Region: Pasture A might be used in regions with better climate for cattle
- Genetics: Farms using Pasture A might also use superior genetics
Code
# Simulate pasture study with confounding
set.seed(987)
n_farms <- 20
pasture_data <- tibble(
farm = 1:n_farms,
pasture_type = rep(c("Fescue", "Mixed Grass"), each = 10)
) %>%
mutate(
# Farm quality is confounder: better farms choose fescue
farm_quality = ifelse(pasture_type == "Fescue",
rnorm(n_farms/2, mean = 75, sd = 8),
rnorm(n_farms/2, mean = 60, sd = 8)),
# ADG depends on farm quality, NOT pasture type!
adg = 1.2 + 0.012 * farm_quality + rnorm(n_farms, mean = 0, sd = 0.15)
)
# Visualize the confounding
p6 <- ggplot(pasture_data, aes(x = farm_quality, y = adg, color = pasture_type, shape = pasture_type)) +
geom_point(size = 4, alpha = 0.8) +
geom_smooth(method = "lm", se = FALSE, linewidth = 1.2) +
scale_color_manual(values = c("Fescue" = "#D55E00", "Mixed Grass" = "#0072B2")) +
labs(
title = "Confounding Example: Farm Quality Affects Both Pasture Choice and ADG",
subtitle = "Better farms choose fescue AND have higher ADG (but pasture isn't the cause)",
x = "Farm Quality Score",
y = "Average Daily Gain (kg/day)",
color = "Pasture Type",
shape = "Pasture Type"
) +
theme(legend.position = "top")
print(p6)Code
# Naive comparison
pasture_data %>%
group_by(pasture_type) %>%
summarise(mean_adg = mean(adg), .groups = 'drop') %>%
knitr::kable(digits = 3, col.names = c("Pasture Type", "Mean ADG (kg/day)"))| Pasture Type | Mean ADG (kg/day) |
|---|---|
| Fescue | 2.028 |
| Mixed Grass | 2.026 |
The fescue group has higher ADG, but it’s because better farms choose fescue—not because fescue itself is superior.
In observational studies:
- Statistical adjustment (multiple regression, matching, stratification)
- Careful measurement of potential confounders
- Acknowledge limitations in conclusions
In experimental studies:
- Randomization (the gold standard—discussed next!)
- Blocking/stratification
- Standardizing all other conditions
14 Randomized Controlled Trials (RCTs)
The randomized controlled trial (RCT) is the gold standard for establishing causation. It’s an experimental design where:
- Participants (animals) are randomly assigned to treatment groups
- One group receives the intervention, another serves as a control
- All other conditions are kept as similar as possible
- Outcomes are measured and compared
14.1 Why Randomization is Powerful
Random assignment ensures that treatment groups are balanced on all variables—both measured and unmeasured—on average.
This is crucial because:
- You can’t measure every potential confounder
- You don’t always know what the confounders are
- Randomization balances them automatically (in expectation)
14.1.1 Mathematical Intuition
When you randomly assign \(n\) animals to groups, every animal has an equal probability of being in any group. This means:
\[ E[\text{Confounder}_{\text{Treatment}}] = E[\text{Confounder}_{\text{Control}}] \]
Where \(E[\cdot]\) denotes expected value (average across many repetitions).
In plain English: On average, the treatment and control groups will have the same distribution of age, weight, genetics, health status, etc.—even if you don’t measure these variables!
14.2 Key Features of RCTs
14.2.1 1. Random Assignment
Not “haphazard” or “arbitrary”—random using a chance mechanism (coin flip, random number generator, etc.).
Example: 60 pigs, 30 to each group
Code
set.seed(2025)
# Start with 60 pigs with various characteristics
pigs <- tibble(
pig_id = 1:60,
initial_weight = rnorm(60, mean = 25, sd = 4),
age_days = round(runif(60, min = 50, max = 70)),
sex = sample(c("Male", "Female"), 60, replace = TRUE),
litter = sample(1:15, 60, replace = TRUE)
)
# RANDOMLY assign to treatment
pigs <- pigs %>%
mutate(treatment = sample(rep(c("Control", "Probiotic"), each = 30)))
# Check balance
pigs %>%
group_by(treatment) %>%
summarise(
n = n(),
mean_weight = mean(initial_weight),
mean_age = mean(age_days),
prop_male = mean(sex == "Male"),
.groups = 'drop'
) %>%
knitr::kable(digits = 2,
col.names = c("Treatment", "N", "Mean Weight (kg)",
"Mean Age (days)", "Proportion Male"))| Treatment | N | Mean Weight (kg) | Mean Age (days) | Proportion Male |
|---|---|---|---|---|
| Control | 30 | 25.19 | 60.97 | 0.47 |
| Probiotic | 30 | 25.89 | 59.77 | 0.37 |
Notice how the groups are similar on all measured characteristics—that’s randomization working!
14.2.2 2. Control Group
The control group provides the counterfactual: what would have happened without the treatment?
Types of controls:
- Negative control: No treatment (or placebo)
- Positive control: Standard treatment (if testing a new alternative)
- Multiple controls: Compare several treatments
14.2.3 3. Blinding (when possible)
Blinding means keeping the treatment assignment hidden to reduce bias:
- Single-blind: Animals (or caretakers) don’t know which group receives which treatment
- Double-blind: Neither caretakers nor researchers analyzing data know
Example: In a drug trial for cattle, identical-looking pills (one with drug, one placebo) prevent the farm workers from treating groups differently.
Note: Blinding isn’t always possible in animal science (e.g., you can’t hide which diet an animal is eating), but controlling for observer bias is still important.
14.2.4 4. Standardization
Keep all other conditions identical between groups:
- Same housing
- Same feeding schedule
- Same environmental conditions
- Same outcome measurement procedures
14.3 Example RCT: Feed Additive Trial in Swine
Research question: Does a novel feed additive improve average daily gain (ADG) in growing pigs?
Design:
- Population: 120 pigs (60-day-old, weaned)
- Randomization: Randomly assign 60 to control diet, 60 to additive diet
- Control: Standard corn-soybean diet
- Treatment: Same diet + 0.5% additive
- Blinding: Farm workers don’t know which pens get which diet (feed is labeled A/B)
- Standardization: All pigs housed in identical pens, same schedule, same health protocols
- Duration: 90 days
- Outcome: Average daily gain (kg/day)
Code
set.seed(111)
# Simulate RCT data
rct_pigs <- tibble(
pig_id = 1:120,
# Randomize first
treatment = sample(rep(c("Control", "Additive"), each = 60)),
# Baseline characteristics are balanced (due to randomization)
initial_weight = rnorm(120, mean = 20, sd = 3),
# Outcome: additive truly improves ADG by 0.05 kg/day
adg = ifelse(treatment == "Control",
rnorm(60, mean = 0.75, sd = 0.10),
rnorm(60, mean = 0.75 + 0.05, sd = 0.10))
)
# Visualize
p7 <- ggplot(rct_pigs, aes(x = treatment, y = adg, fill = treatment)) +
geom_boxplot(alpha = 0.6, outlier.shape = NA) +
geom_jitter(width = 0.2, alpha = 0.3, size = 1.5) +
stat_summary(fun = mean, geom = "point", shape = 23, size = 5, fill = "white", color = "black") +
scale_fill_manual(values = c("Control" = "#E69F00", "Additive" = "#56B4E9")) +
labs(
title = "RCT Results: Feed Additive Effect on Average Daily Gain",
subtitle = "White diamond = group mean",
y = "Average Daily Gain (kg/day)",
x = "Treatment Group"
) +
theme(legend.position = "none")
print(p7)Code
# Statistical test
rct_test <- t.test(adg ~ treatment, data = rct_pigs)
effect_size <- mean(rct_pigs$adg[rct_pigs$treatment == "Additive"]) -
mean(rct_pigs$adg[rct_pigs$treatment == "Control"])
cat(sprintf("\nControl mean ADG: %.3f kg/day\n",
mean(rct_pigs$adg[rct_pigs$treatment == "Control"])))
Control mean ADG: 0.767 kg/day
Code
cat(sprintf("Additive mean ADG: %.3f kg/day\n",
mean(rct_pigs$adg[rct_pigs$treatment == "Additive"])))Additive mean ADG: 0.786 kg/day
Code
cat(sprintf("Difference: %.3f kg/day\n", effect_size))Difference: 0.019 kg/day
Code
cat(sprintf("95%% CI: [%.3f, %.3f]\n", rct_test$conf.int[1], rct_test$conf.int[2]))95% CI: [-0.019, 0.056]
Code
cat(sprintf("P-value: %.4f\n", rct_test$p.value))P-value: 0.3196
Code
cat("\n✓ Because of RANDOM ASSIGNMENT, we can conclude:\n")
✓ Because of RANDOM ASSIGNMENT, we can conclude:
Code
cat(" The additive CAUSES a ~0.05 kg/day increase in growth rate.\n") The additive CAUSES a ~0.05 kg/day increase in growth rate.
14.4 Limitations of RCTs
Despite being the gold standard, RCTs have limitations:
- Cost: Experiments are expensive and time-consuming
- Ethics: Some treatments can’t be tested experimentally (e.g., exposing animals to disease)
- Practicality: Long-term outcomes (years) may be infeasible
- External validity: Controlled conditions may not reflect real-world settings
- Sample size: May need large numbers to detect small effects
When RCTs aren’t possible, observational studies remain valuable—but we must be cautious about causal claims and carefully consider confounding.
15 Summary and Key Takeaways
Congratulations! You’ve completed the foundational week of statistical thinking. Let’s review the key concepts:
15.1 Main Concepts
15.2 Looking Ahead
Next week, we’ll move from big-picture philosophy to practical tools: how to describe and summarize data using descriptive statistics and exploratory data analysis. We’ll learn to:
- Calculate and interpret measures of central tendency and variability
- Visualize distributions effectively
- Identify outliers and unusual patterns
- Create publication-quality summary tables
These foundational skills will prepare us for inferential statistics (hypothesis testing, confidence intervals, regression) in subsequent weeks.
15.3 Reflection Questions
Before next week’s class, think about:
Find a recent paper in your area of animal science. Is it observational or experimental? If observational, what are potential confounders?
Look at the p-values reported in the paper. Are effect sizes and confidence intervals also reported? If not, what information is missing?
If the paper claims causation, is that claim justified by the study design?
15.4 Additional Resources
15.4.1 Recommended Reading
- ASA Statement on P-values and Statistical Significance (2016) – required reading
- Greenland et al. (2016): “Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations” – comprehensive list of common errors
- Ioannidis (2005): “Why Most Published Research Findings Are False” – provocative but important
15.4.2 Videos
- StatQuest by Josh Starmer (YouTube): “P-values, clearly explained”
- “Dance of the p-values” (YouTube): Visual demonstration of p-value behavior
15.4.3 Books
- The Lady Tasting Tea by David Salsburg – history of statistics, very readable
- Naked Statistics by Charles Wheelan – conceptual introduction, no equations
15.5 Session Info
Code
sessionInfo()R version 4.4.2 (2024-10-31)
Platform: x86_64-apple-darwin20
Running under: macOS Sequoia 15.6.1
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: America/Chicago
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] scales_1.4.0 patchwork_1.3.2 broom_1.0.7 lubridate_1.9.3
[5] forcats_1.0.0 stringr_1.5.1 dplyr_1.1.4 purrr_1.0.4
[9] readr_2.1.5 tidyr_1.3.1 tibble_3.2.1 ggplot2_4.0.0
[13] tidyverse_2.0.0
loaded via a namespace (and not attached):
[1] utf8_1.2.4 generics_0.1.3 xml2_1.3.6 lattice_0.22-6
[5] stringi_1.8.4 hms_1.1.3 digest_0.6.37 magrittr_2.0.3
[9] evaluate_1.0.1 grid_4.4.2 timechange_0.3.0 RColorBrewer_1.1-3
[13] fastmap_1.2.0 Matrix_1.7-1 jsonlite_1.8.9 backports_1.5.0
[17] mgcv_1.9-1 fansi_1.0.6 viridisLite_0.4.2 textshaping_0.4.0
[21] cli_3.6.4 rlang_1.1.6 splines_4.4.2 withr_3.0.2
[25] yaml_2.3.10 tools_4.4.2 tzdb_0.4.0 kableExtra_1.4.0
[29] vctrs_0.6.5 R6_2.5.1 lifecycle_1.0.4 htmlwidgets_1.6.4
[33] pkgconfig_2.0.3 pillar_1.9.0 gtable_0.3.6 glue_1.8.0
[37] systemfonts_1.3.1 xfun_0.53 tidyselect_1.2.1 rstudioapi_0.17.1
[41] knitr_1.49 farver_2.1.2 nlme_3.1-166 htmltools_0.5.8.1
[45] labeling_0.4.3 rmarkdown_2.29 svglite_2.2.1 compiler_4.4.2
[49] S7_0.2.0
End of Week 1: Statistical Foundations and Study Design